{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## TransactionEncoder"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Encoder class for transaction data in Python lists"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> from mlxtend.preprocessing import TransactionEncoder"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Overview"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Encodes database transaction data in form of a Python list of lists into a NumPy array."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example 1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "Suppose we have the following transaction data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from mlxtend.preprocessing import TransactionEncoder\n",
    "\n",
    "dataset = [['Apple', 'Beer', 'Rice', 'Chicken'],\n",
    "           ['Apple', 'Beer', 'Rice'],\n",
    "           ['Apple', 'Beer'],\n",
    "           ['Apple', 'Bananas'],\n",
    "           ['Milk', 'Beer', 'Rice', 'Chicken'],\n",
    "           ['Milk', 'Beer', 'Rice'],\n",
    "           ['Milk', 'Beer'],\n",
    "           ['Apple', 'Bananas']]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Using and `TransactionEncoder` object, we can transform this dataset into an array format suitable for typical machine learning APIs. Via the `fit` method, the `TransactionEncoder` learns the unique labels in the dataset, and via the `transform` method, it transforms the input dataset (a Python list of lists) into a one-hot encoded NumPy boolean array:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ True, False,  True,  True, False,  True],\n",
       "       [ True, False,  True, False, False,  True],\n",
       "       [ True, False,  True, False, False, False],\n",
       "       [ True,  True, False, False, False, False],\n",
       "       [False, False,  True,  True,  True,  True],\n",
       "       [False, False,  True, False,  True,  True],\n",
       "       [False, False,  True, False,  True, False],\n",
       "       [ True,  True, False, False, False, False]], dtype=bool)"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "te = TransactionEncoder()\n",
    "te_ary = te.fit(dataset).transform(dataset)\n",
    "te_ary"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The NumPy array is boolean for the sake of memory efficiency when working with large datasets. If a classic integer representation is desired instead, we can just convert the array to the appropriate type: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[1, 0, 1, 1, 0, 1],\n",
       "       [1, 0, 1, 0, 0, 1],\n",
       "       [1, 0, 1, 0, 0, 0],\n",
       "       [1, 1, 0, 0, 0, 0],\n",
       "       [0, 0, 1, 1, 1, 1],\n",
       "       [0, 0, 1, 0, 1, 1],\n",
       "       [0, 0, 1, 0, 1, 0],\n",
       "       [1, 1, 0, 0, 0, 0]])"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "te_ary.astype(\"int\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After fitting, the unique column names that correspond to the data array shown above can be accessed via the `columns_` attribute:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['Apple', 'Bananas', 'Beer', 'Chicken', 'Milk', 'Rice']"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "te.columns_"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For our convenience, we can turn theencoded array into a pandas `DataFrame`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Apple</th>\n",
       "      <th>Bananas</th>\n",
       "      <th>Beer</th>\n",
       "      <th>Chicken</th>\n",
       "      <th>Milk</th>\n",
       "      <th>Rice</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Apple  Bananas   Beer  Chicken   Milk   Rice\n",
       "0   True    False   True     True  False   True\n",
       "1   True    False   True    False  False   True\n",
       "2   True    False   True    False  False  False\n",
       "3   True     True  False    False  False  False\n",
       "4  False    False   True     True   True   True\n",
       "5  False    False   True    False   True   True\n",
       "6  False    False   True    False   True  False\n",
       "7   True     True  False    False  False  False"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "pd.DataFrame(te_ary, columns=te.columns_)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If we desire, we can turn the one-hot encoded array back into a transaction list of lists via the `inverse_transform` function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[['Apple', 'Beer', 'Chicken', 'Rice'],\n",
       " ['Apple', 'Beer', 'Rice'],\n",
       " ['Apple', 'Beer'],\n",
       " ['Apple', 'Bananas']]"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "first4 = te_ary[:4]\n",
    "te.inverse_transform(first4)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## API"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "## TransactionEncoder\n",
      "\n",
      "*TransactionEncoder()*\n",
      "\n",
      "Encoder class for transaction data in Python lists\n",
      "\n",
      "**Parameters**\n",
      "\n",
      "None\n",
      "\n",
      "**Attributes**\n",
      "\n",
      "columns_: list\n",
      "List of unique names in the `X` input list of lists\n",
      "\n",
      "**Examples**\n",
      "\n",
      "For usage examples, please see\n",
      "[http://rasbt.github.io/mlxtend/user_guide/preprocessing/TransactionEncoder/](http://rasbt.github.io/mlxtend/user_guide/preprocessing/TransactionEncoder/)\n",
      "\n",
      "### Methods\n",
      "\n",
      "<hr>\n",
      "\n",
      "*fit(X)*\n",
      "\n",
      "Learn unique column names from transaction DataFrame\n",
      "\n",
      "**Parameters**\n",
      "\n",
      "- `X` : list of lists\n",
      "\n",
      "    A python list of lists, where the outer list stores the\n",
      "    n transactions and the inner list stores the items in each\n",
      "    transaction.\n",
      "\n",
      "    For example,\n",
      "    [['Apple', 'Beer', 'Rice', 'Chicken'],\n",
      "    ['Apple', 'Beer', 'Rice'],\n",
      "    ['Apple', 'Beer'],\n",
      "    ['Apple', 'Bananas'],\n",
      "    ['Milk', 'Beer', 'Rice', 'Chicken'],\n",
      "    ['Milk', 'Beer', 'Rice'],\n",
      "    ['Milk', 'Beer'],\n",
      "    ['Apple', 'Bananas']]\n",
      "\n",
      "<hr>\n",
      "\n",
      "*fit_transform(X, sparse=False)*\n",
      "\n",
      "Fit a TransactionEncoder encoder and transform a dataset.\n",
      "\n",
      "<hr>\n",
      "\n",
      "*get_params(deep=True)*\n",
      "\n",
      "Get parameters for this estimator.\n",
      "\n",
      "**Parameters**\n",
      "\n",
      "- `deep` : boolean, optional\n",
      "\n",
      "    If True, will return the parameters for this estimator and\n",
      "    contained subobjects that are estimators.\n",
      "\n",
      "**Returns**\n",
      "\n",
      "- `params` : mapping of string to any\n",
      "\n",
      "    Parameter names mapped to their values.\n",
      "\n",
      "<hr>\n",
      "\n",
      "*inverse_transform(array)*\n",
      "\n",
      "Transforms an encoded NumPy array back into transactions.\n",
      "\n",
      "**Parameters**\n",
      "\n",
      "- `array` : NumPy array [n_transactions, n_unique_items]\n",
      "\n",
      "    The NumPy one-hot encoded boolean array of the input transactions,\n",
      "    where the columns represent the unique items found in the input\n",
      "    array in alphabetic order\n",
      "\n",
      "    For example,\n",
      "```\n",
      "    array([[True , False, True , True , False, True ],\n",
      "    [True , False, True , False, False, True ],\n",
      "    [True , False, True , False, False, False],\n",
      "    [True , True , False, False, False, False],\n",
      "    [False, False, True , True , True , True ],\n",
      "    [False, False, True , False, True , True ],\n",
      "    [False, False, True , False, True , False],\n",
      "    [True , True , False, False, False, False]])\n",
      "```\n",
      "    The corresponding column labels are available as self.columns_,\n",
      "    e.g., ['Apple', 'Bananas', 'Beer', 'Chicken', 'Milk', 'Rice']\n",
      "\n",
      "**Returns**\n",
      "\n",
      "- `X` : list of lists\n",
      "\n",
      "    A python list of lists, where the outer list stores the\n",
      "    n transactions and the inner list stores the items in each\n",
      "    transaction.\n",
      "\n",
      "    For example,\n",
      "```\n",
      "    [['Apple', 'Beer', 'Rice', 'Chicken'],\n",
      "    ['Apple', 'Beer', 'Rice'],\n",
      "    ['Apple', 'Beer'],\n",
      "    ['Apple', 'Bananas'],\n",
      "    ['Milk', 'Beer', 'Rice', 'Chicken'],\n",
      "    ['Milk', 'Beer', 'Rice'],\n",
      "    ['Milk', 'Beer'],\n",
      "    ['Apple', 'Bananas']]\n",
      "```\n",
      "\n",
      "<hr>\n",
      "\n",
      "*set_params(**params)*\n",
      "\n",
      "Set the parameters of this estimator.\n",
      "\n",
      "The method works on simple estimators as well as on nested objects\n",
      "(such as pipelines). The latter have parameters of the form\n",
      "``<component>__<parameter>`` so that it's possible to update each\n",
      "component of a nested object.\n",
      "\n",
      "**Returns**\n",
      "\n",
      "self\n",
      "\n",
      "<hr>\n",
      "\n",
      "*transform(X, sparse=False)*\n",
      "\n",
      "Transform transactions into a one-hot encoded NumPy array.\n",
      "\n",
      "**Parameters**\n",
      "\n",
      "- `X` : list of lists\n",
      "\n",
      "    A python list of lists, where the outer list stores the\n",
      "    n transactions and the inner list stores the items in each\n",
      "    transaction.\n",
      "\n",
      "    For example,\n",
      "    [['Apple', 'Beer', 'Rice', 'Chicken'],\n",
      "    ['Apple', 'Beer', 'Rice'],\n",
      "    ['Apple', 'Beer'],\n",
      "    ['Apple', 'Bananas'],\n",
      "    ['Milk', 'Beer', 'Rice', 'Chicken'],\n",
      "    ['Milk', 'Beer', 'Rice'],\n",
      "    ['Milk', 'Beer'],\n",
      "    ['Apple', 'Bananas']]\n",
      "\n",
      "    sparse: bool (default=False)\n",
      "    If True, transform will return Compressed Sparse Row matrix\n",
      "    instead of the regular one.\n",
      "\n",
      "**Returns**\n",
      "\n",
      "- `array` : NumPy array [n_transactions, n_unique_items]\n",
      "\n",
      "    if sparse=False (default).\n",
      "    Compressed Sparse Row matrix otherwise\n",
      "    The one-hot encoded boolean array of the input transactions,\n",
      "    where the columns represent the unique items found in the input\n",
      "    array in alphabetic order. Exact representation depends\n",
      "    on the sparse argument\n",
      "\n",
      "    For example,\n",
      "    array([[True , False, True , True , False, True ],\n",
      "    [True , False, True , False, False, True ],\n",
      "    [True , False, True , False, False, False],\n",
      "    [True , True , False, False, False, False],\n",
      "    [False, False, True , True , True , True ],\n",
      "    [False, False, True , False, True , True ],\n",
      "    [False, False, True , False, True , False],\n",
      "    [True , True , False, False, False, False]])\n",
      "    The corresponding column labels are available as self.columns_, e.g.,\n",
      "    ['Apple', 'Bananas', 'Beer', 'Chicken', 'Milk', 'Rice']\n",
      "\n",
      "\n"
     ]
    }
   ],
   "source": [
    "with open('../../api_modules/mlxtend.preprocessing/TransactionEncoder.md', 'r') as f:\n",
    "    print(f.read())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.4"
  },
  "toc": {
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
